Central Business Name Index - Zefix

Swiss commerce register

The Federal Office of Justice maintains ZEFIX, the swiss commerce register for all legally operating businesses.

The register provides us with company name, type, description, and address.
ZEFIX is also available as Linked Data.

Setup

SPARQL endpoints

For companies data

Swiss commerce register data can be accessed with SPARQL queries.
You can send queries using HTTP requests. The API endpoint is https://lindas.admin.ch/query/.

For geodata

To understand companies' location, we will work with Swiss geodata. It can be accessed with GeoSPARQL under https://ld.geo.admin.ch/query.

SPARQL client

Let's use SparqlClient from graphly to communicate with both databases. Graphly will allow us to:

  • send SPARQL queries
  • automatically add prefixes to all queries
  • format response to pandas or geopandas
In [1]:
import json
import os.path

import folium
import mapclassify
import matplotlib as mpl
import matplotlib.pyplot as plt
import pandas as pd
import geopandas as gpd
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots

from graphly.api_client import SparqlClient

%matplotlib inline
In [2]:
# Uncomment to install dependencies in Colab environment
#!pip install mapclassify
#!pip install git+https://github.com/zazuko/graphly.git
In [3]:
sparql = SparqlClient("https://lindas.admin.ch/query")
geosparql = SparqlClient("https://ld.geo.admin.ch/query")

sparql.add_prefixes({
    "schema": "<http://schema.org/>",
    "admin": "<https://schema.ld.admin.ch/>"
})

geosparql.add_prefixes({
    "dct": "<http://purl.org/dc/terms/>",
    "geonames": "<http://www.geonames.org/ontology#>",
    "schema": "<http://schema.org/>",
    "geosparql": "<http://www.opengis.net/ont/geosparql#>",
})

SPARQL queries can become very long. To improve the readibility, we will work wih prefixes.

Using the add_prefixes method, we can define persistent prefixes. Every time you send a query, graphly will now automatically add the prefixes for you.

Companies by company type

Swiss law offers many opportnities to business owners. They can choose from several legal entites. What entities can we find in ZEFIX? Which ones are the most popular among enterpreneurs?

In [4]:
query = """
SELECT ?type (COUNT(DISTINCT ?company_iri) AS ?ccount)
WHERE {
    ?company_iri a admin:ZefixOrganisation.
    ?company_iri schema:additionalType/schema:name ?type .
  
  FILTER(LANG(?type) = "de")
}
GROUP BY ?type
ORDER BY DESC(?ccount)
"""

df = sparql.send_query(query)
In [5]:
# Let's rename variables to english

de2en = {'Kommanditgesellschaft': "Limited Partnership",
         'Ausländische Niederlassung im Handelsregister eingetragen': "Foreign Branch",
         'Genossenschaft': "Cooperative",
         'Verein': "Association",
         'Kollektivgesellschaft': "General Partnership",
         'Schweizerische Zweigniederlassung im Handelsregister eingetragen': "Swiss Branch",
         'Stiftung': "Foundation",
         'Einzelunternehmen': "Sole proprietorship",
         'Aktiengesellschaft': "Joint-stock Company",
         'Gesellschaft mit beschränkter Haftung GMBH / SARL': "Limited Liability Company"}

df["type"] = df["type"].apply(lambda x: de2en[x] if x in de2en else x)
In [6]:
fig = px.bar(df[df.ccount > 500], y="type", x="ccount", orientation = "h", labels={"type": "", "ccount": "Company Count"})
fig.update_layout(
    title='Which company type is most popular?', 
    title_x=0.5,
)
fig.update_layout(bargap=0.40)
fig.show()

Companies by municipality

The economic activity varies significantly across regions. What are the most popular places to register a company?

In [7]:
query = """
SELECT ?municipality ?muni_id (COUNT(?sub) AS ?companies)
FROM <https://lindas.admin.ch/foj/zefix>
FROM <https://lindas.admin.ch/territorial>
WHERE {
	?sub a admin:ZefixOrganisation ;
      admin:municipality ?muni_id.
    ?muni_id schema:name ?municipality;
} 
GROUP BY ?municipality ?muni_id
ORDER BY DESC(?companies)
"""

df = sparql.send_query(query)
In [8]:
# Communes
query = """    
SELECT ?muni_id ?population ?boundary 

WHERE {
  ?muni_iri dct:hasVersion ?version ;
            geonames:featureCode geonames:A.ADM3 .
  
  ?version schema:validUntil "2020-12-31"^^<http://www.w3.org/2001/XMLSchema#date>;
           geonames:population ?population;
           geosparql:hasGeometry/geosparql:asWKT ?boundary.
  
  BIND(IRI(REPLACE(STR(?muni_iri), "https://ld.geo.admin.ch/boundaries/", "https://ld.admin.ch/")) AS ?muni_id)
}
"""
communes = geosparql.send_query(query)
communes = communes.set_crs(epsg=4326)
communes["center"] = communes.to_crs(epsg=3035).centroid.to_crs(epsg=4326)
In [9]:
join = pd.merge(communes, df, how="inner", on="muni_id")
join.sort_values(by="companies", ascending=False, inplace=True)
join = join.reset_index(drop=True)
join[["municipality", "companies"]]
Out[9]:
municipality companies
0 Zürich 45015
1 Genève 29387
2 Zug 17080
3 Basel 16102
4 Lugano 13679
... ... ...
2133 Seehof 2
2134 Rongellen 2
2135 Bister 1
2136 Schelten 1
2137 Rebévelier 1

2138 rows × 2 columns

The most popular places are the biggest cities.

This should come at no surprise. The more people live in a certain area, the more business founders it will have. The company density is expected to be directly correlated with population density.

Now, let's take into account the population density. Which regions have most companies per 10.000 inhabitants?

In [10]:
join["companies_per_inhabitants"] = round(join.companies/join.population*10000)
join["companies_per_inhabitants"] = join["companies_per_inhabitants"].astype(int)
join.sort_values(by="companies_per_inhabitants", ascending=False, inplace=True)
join = join.reset_index(drop=True)
join[["municipality", "companies_per_inhabitants"]].head()
Out[10]:
municipality companies_per_inhabitants
0 Zug 5592
1 Soyhières 4480
2 Manno 3497
3 Feusisberg 3449
4 Chiasso 3381

Interesting! Zug is the only bigger city that appears on the top of the list.

Where are the most popular registration destinations?

In [11]:
# Companies per 10.000 inhabitants

style_function = lambda x: {'fillColor': '#ffffff', 
                            'color':'#000000', 
                            'fillOpacity': 0.1, 
                            'weight': 0.1}

highlight_function = lambda x: {'fillColor': '#989898', 
                                'color':'#000000', 
                                'fillOpacity': 0.8}

def plot_heatmap(df, viz_column, hover_cols, col2label, title):
    
    classifier = mapclassify.NaturalBreaks(y=df[viz_column], k=5)
    bins = [df[viz_column].min()] + list(classifier.bins)
    cols = ["municipality", viz_column, *hover_cols]

    m = folium.Map(location=[46.83, 8.13], zoom_start=8, tiles="cartodbpositron")

    folium.Choropleth(
        geo_data=json.loads(df.to_json()),
        data=df,
        columns=cols,
        key_on="feature.properties.municipality",
        fill_color="YlOrRd",
        fill_opacity=1,
        line_weight=0,
        smooth_factor=0,
        bins=bins,
        reset=True,
        legend_name=col2label[viz_column],
    ).add_to(m)

    hover = folium.features.GeoJson(
        df,
        style_function=style_function, 
        control=False,
        highlight_function=highlight_function,
        tooltip=folium.features.GeoJsonTooltip(
            fields=cols,
            aliases=[col2label[i] for i in cols],
            style=("background-color: white; color: #333333; font-family: arial; font-size: 12px; padding: 10px;") 
        )
    )

    folium.LayerControl().add_to(m)
    m.add_child(hover)
    m.keep_in_front(hover)
    
    title_html = '''<h3 align="center" style="font-size:16px"><b>{}</b></h3>'''.format(title)   
    m.get_root().html.add_child(folium.Element(title_html))

    return m
In [12]:
join = join.drop(columns="center")
col2label = {"municipality": "Municipality", "companies_per_inhabitants": "Companies per 10.000 inhabitants", "companies": "Companies ", "population": "Population"}
plot_heatmap(join, "companies_per_inhabitants", ["companies", "population"], col2label, "Where are most companies registered?")
Out[12]:
Make this Notebook Trusted to load map: File -> Trust Notebook

Companies by address

We know which municipalities have the most registered companies. Now, let's dive one level deeper.

Every company has to register its business address. So what swiss addresses are most beloved among companies?

In [13]:
query = """
SELECT ?street ?zip ?town (COUNT(?company) AS ?ccount)
FROM <https://lindas.admin.ch/foj/zefix>
WHERE {
  ?company a <https://schema.ld.admin.ch/ZefixOrganisation>;
           <http://schema.org/address> ?address.
  
  ?address <http://schema.org/streetAddress> ?street;
           <http://www.w3.org/ns/locn#postCode> ?zip;
           <http://www.w3.org/ns/locn#postName> ?town.
}
GROUP BY ?town ?zip ?street
ORDER BY DESC (?ccount)
LIMIT 15
"""

df = sparql.send_query(query)
df
Out[13]:
street zip town ccount
0 c/o Telan AG\nMurbacherstrasse 37 6003 Luzern 399
1 Sihleggstrasse 23 8832 Wollerau 330
2 Bahnhofstrasse 21 6300 Zug 225
3 Haldenstrasse 5 6340 Baar 215
4 Neuhofstrasse 5A 6340 Baar 193
5 Dammstrasse 19 6300 Zug 191
6 Gotthardstrasse 26 6300 Zug 182
7 Technoparkstrasse 1 8005 Zürich 180
8 Roosstrasse 53 8832 Wollerau 173
9 Industriestrasse 47 6300 Zug 171
10 Alte Steinhauserstrasse 1 6330 Cham 156
11 Churerstrasse 35 9470 Buchs SG 154
12 Chamerstrasse 172 6300 Zug 153
13 Oberneuhofstrasse 5 6340 Baar 152
14 Birkenstrasse 47 6343 Rotkreuz 144

Wow! Over 400 companies are registered under the address of Telan AG in Luzern.

Some of those addresses represent big offices. For example, Technoparkstrasse 1 in Zurich is one of the biggest office buildings in the city.

Other addresses are simply registrations used by many, many people. Take a look, and google some of the addresses in Zug. How many companies can you find operating under this address?


Thus far we had a look at individual buildings. Now, let's aggregate this result on the street level. What are the most crowded streets with company registrations? Can we find some business centers?

In [14]:
query = """
SELECT ?thoroughfare ?zip ?town (COUNT(?company) AS ?ccount)
FROM <https://lindas.admin.ch/foj/zefix>
WHERE {
  ?company a <https://schema.ld.admin.ch/ZefixOrganisation>;
           <http://schema.org/address> ?address.
  
  ?address <http://www.w3.org/ns/locn#thoroughfare> ?thoroughfare;
           <http://www.w3.org/ns/locn#postCode> ?zip;
           <http://www.w3.org/ns/locn#postName> ?town.
}
GROUP BY ?town ?zip ?thoroughfare
HAVING (?ccount >= 100)
ORDER BY DESC (?ccount)
"""
df = sparql.send_query(query, timeout=60)
df.head(15)
Out[14]:
thoroughfare zip town ccount
0 Baarerstrasse 6300 Zug 2961
1 Bahnhofstrasse 6300 Zug 1402
2 Bösch 6331 Hünenberg 954
3 Seefeldstrasse 8008 Zürich 820
4 Chamerstrasse 6300 Zug 775
5 Zugerstrasse 6340 Baar 747
6 Corso San Gottardo 6830 Chiasso 723
7 Churerstrasse 8808 Pfäffikon SZ 708
8 Bahnhofstrasse 8001 Zürich 693
9 Industriestrasse 6300 Zug 653
10 rue du Rhône 1204 Genève 653
11 Poststrasse 6300 Zug 637
12 Baarerstrasse 6302 Zug 603
13 Rue du Rhône 1204 Genève 577
14 Gewerbestrasse 6330 Cham 572

Slightly different! We see clearly that Zug is the favourite destination for company registrations.

What are other popular locations?

In [15]:
def geocode(df: pd.DataFrame, cache_file: str="addresses.json"):

    if os.path.isfile(cache_file): 
        cached_locations = gpd.read_file(cache_file)
        cached_addresses = cached_locations.address
    else:
        cached_locations = gpd.GeoDataFrame(columns=["address", "geometry"], crs="EPSG:4326")
        cached_addresses = []
        
    is_new_address = ~df.address.isin(cached_addresses)

    if any(is_new_address):
        new_locations = gpd.tools.geocode(df.address[is_new_address], user_agent="zazuko", provider="nominatim")
        new_locations["address"] = df.address[is_new_address]
        new_locations = new_locations.set_crs(epsg=4326)
        new_locations = new_locations[~new_locations.geometry.is_empty]
        all_locations = gpd.GeoDataFrame(pd.concat([new_locations, cached_locations], ignore_index=True), crs=new_locations.crs)
        all_locations.to_file(cache_file, driver="GeoJSON")
    else:
        all_locations = cached_locations

    return pd.merge(all_locations, df, on="address", how="right")
In [16]:
df.loc[: ,"address"] = df["thoroughfare"] + ", " + df["zip"].astype(str) + " " + df["town"] + ", Switzerland"
plot_df = geocode(df)

bins = [250, 500, 1000, df.ccount.max()]
plot_df.loc[:,"text"] = df["thoroughfare"] + ", " + df["town"] + "<br>Companies: " + plot_df.ccount.astype(int).astype(str)
classifier = mapclassify.UserDefined(y=df.ccount, bins=bins)
plot_df.loc[:, "buckets"] = plot_df[["ccount"]].apply(classifier) 

norm = mpl.colors.Normalize(vmin=0, vmax=len(bins))
colormap = mpl.cm.ScalarMappable(norm=norm, cmap=mpl.cm.inferno)
labels = mapclassify.classifiers._get_mpl_labels(classifier, fmt="{:.0f}")

fig = go.Figure()

for bucket in range(len(bins)):

    subset = plot_df[plot_df.buckets == bucket]

    #sizes = (subset.ccount*0.05)**0.9
    #sizes[sizes<7] = 7
    #sizes[sizes>50] = 50

    fig.add_trace(go.Scattermapbox(
        mode="markers",
        lat=subset.geometry.y,
        lon=subset.geometry.x,
        hovertext = subset.text,
        hoverinfo = "text",
        name=labels[bucket],
        marker={'size': 7.5, "color": "rgba{}".format(colormap.to_rgba(bucket+1))}, #((subset.ccount)*0.05)**0.9
    ))

fig.update_layout(
    margin={'l': 0, 't': 50, 'b': 0, 'r': 0},
    mapbox={
        'center': {"lat": 46.80515, "lon": 8.1336},
        'style': "carto-darkmatter",
        'zoom': 6.9},
    showlegend=True,
    legend_title="Registered companies",
    title_text='Where are most companies registered?', 
    title_x=0.5,
    width=980,
    height=600
)

fig.show("notebook")

Thank you for following along and hopefully this notebook was helpful. If you want to get in touch with us, please reach out to us via email.